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(3) Method for reducing noise In speech signal and method for detecting noise domain. 



@ A noise reducing method for speech signals 
is provided In which the probability of speech 
occurring is calculated by spectral subtraction 
of subtracting the estimated noise spectrum 
from the spectrum of the input signal, and the 
maximum livelihood fiter is adaptivery control- 
led based upon the calculated speech occurr- 
ence probablity. Adjustment to an optimum 
suppression factor may be achieved depending 
on the SNR of the input speech signal, so that is 
ft unnecessary for the user to effect adjustment 
prior to practical application. In addition, a 
method for detecting the noise domain is pro- 
vided in which the value th employed for finding 
the threshold value Trn for noise domain dis- 
crimination is calculated using the RMS value 
of the current frame or the value th of the 
previous frame multiplied by the coefficient a, 
whichever is smaller, and the coefficient a is 
changed over depending on the RMS value of 
the current frame. Noise domain discrimination 
by an optimum threshold value responsive to 
the input signal may be achieved without pro- 
ducing mistaken judgment even on the occa- 
sion of noise level fluctuations. 
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This invention relates to a method for re-duchy th j nois 3 m ooesch signal .uk1 ;i •:.>:) thco ;or d^-jsciij-i j ;no 
noise domain. More particular! /, it rsl3tds to 3 method for reducing tha noisa in tri3 ipooch sifjnaij in which 
noise suppression is achieved by adaptivelv controlling a maximum likelihood filter for calcul ating speech com- 
ponents based upon the speech presence probability and the SN ratio calculated on tru basis of input speech 
signals, and a noise domain detaction method which may be conveniently applied to thD noise reducing meth- 
od. 

In a portable telephone or speech recognition, it is thought to be necessary to suppress environmental 
noise or background noise contained in the collected speech signals and to enhance the speech components. 

As techniques for enhancing the speech or reducing the noise, those employing a conditional probability 
function for adjusting attenuation factor are shown in R. J. McAulay and M.L Malpass, Speech Enhancement 
Using a Soft-Decision Noise Suppression Filter, IEEE Trans. Acoust, Speech, Signal Processing, Vol.28, 
pp.137-145, April 1980, and J. Yang, Frequency Domain Noise Suppression Approach in Mobile Telephone 
System, IEEE ICASSP. vol.ll, pp. 363-366, April 1993. 

With these noise suppression techniques, it may occur frequently that unnatural speech tone or distorted 
speech be produced due to the operation based on an inappropriate fixed signal-to- noise (S/N) ratio or to an 
inappropriate suppression factor. In actual application, it is not desirable for the user to adjust the S/N ratio, 
which is among the parameters of the noise suppression system for achieving an optimum performance. In 
addition, it is difficult with the conventional speech signal enhancement techniques to remove the noise suf- 
ficiently without by-producing the distortion of the speech signals susceptible to considerable fluctuations in 
the short-term S/N ratio. 

With the above-described speech enhancement or noise reducing method, the technique of detecting the 
noise domain is employed, in which the input level or power is compared to a pre-set threshold for discriminating 
the noise domain. However, if the time constant of the threshold value is increased for preventing tracking to 
the speech, it becomes impossible to follow noise level changes, especially to increase In the noise level, thus 
leading to mistaken discrimination. 

In view of the foregoing, it is an object of the present invention to provide a method for reducing the noise 
in speech signals whereby the suppression factor is adjusted to a value optimized with respect to the S/N ratio 
of the actual input responsive to the input speech signals and sufficient noise removal may be achieved wfchout 
producing distortion as secondary effect or without the necessity of pre-adjustment by the user. 

It is another object of the present invention to provide a method for detecting the noise domain whereby 
noise domain discrimination may be achieved based upon an optimum threshold value responsive tothefciput 
signal and mistaken discrimination may be eliminated even on the occasion of noise level fluctuations. 

In one aspect, the present invention provides a method for reducing the noise in an input speech ejgnal 
in which noise suppression is done by adapthvely controlling a maximum li keJihood filter adapted forcaJcuJtfng 
speech components based on the speech presence probability and the S/N ratio calculated based on thefciput 
speech signal. Specifically, the spectral difference, that is, the spectrum of an input signal less an estimated 
noise spectrum, is employed in calculating the probability of speech occurrence. 

Preferably, the value of the above spectrum difference or a pre-set value, whichever is larger, is employed 
for calculating the probability of speech occurrence. Preferably, the value of the above difference or a pre-set 
value, whichever is larger, is calculated for the current frame and tor a previous frame, the value for the pnjyfaus 
frame is multiplied with a pre-set decay coefficient, and the value for the current frame or the value for the 
previous frame multiplied by a pre-set decay coefficient, whichever is larger, is employed for calculating the 
speech presence probability. 

The characteristics of the maximum likelihood filter are processed with smoothing filtering along the fre- 
quency axis or along the time axis. Preferably, a median value of characteristics of the maximum likefood 
filter in the frequency range under consideration and characteristics of the maximum likelihood filter in ntfgh- 
boring left and right frequency ranges is used for smoothing filtering along the frequency axis. 

In another aspect, the present invention provides a method for detecting a noise domain by drvidfcgan 
input speech signal on the frame basis, finding an RMS value on the frame basis and comparing the IMS 
values to a threshold value Th, for detecting the noise domain. Specifically, a value th for finding the thredtod 
Th1 is calculated using the RMS value for the current frame and a value th of the previous frame multyfed 
by a coefficient a, whichever is smaller, and the coefficient a is changed over depending on an RMS vafa»of 
the current frame. In the following embodiment, the threshold value Th, is NoiseRMSft^Jk], while the «*je 
th for finding it is MinNoiseaf^Jk], k being a frame number. As will be explained in the equation (7), the v*ue 
of the previous frame MinNoisertortPc-l] multiplied by the coefficient o[k] is compared to the RMS value of tie 
current frame RMS[k] of the current frame and a smaller value of the two is set to MinNoiseshortPc]. The cocf- 
f ictentfk] is changed over from 1 to 0 or vice versa depending on the RMS value RMS[kJ. 

Preferably, the value th for finding the threshold Th, may be a smaller one of the RMS value for the cvnent 
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frame and a va.ue cn of trie previous fram^ nvjJiipii-jd by a co-j/ficiar,t a, that is MinNob^.fr] | a ^ f 2X _ 
plained, or the sm*n«st RMS vaiue over plural framss, that is MinNoIo^jK], whichavar is larger. 

Also, the noise domain is detected bast;d uoon the results of discrimination of the relative anargy of t ne 
current frame using the threshold value Th 2 calculated using th3 maximum SN rat : o of the input speech signal 
and the results of comparison of the RMS vaJua to the threshold value Th,. In the following embodiment, the 
threshold value Th 2 is dBthres^fk], with the frame-based relative energy being d3 rd . The relative energy dB 
is a relative value with respect to a locaJ peak of the directly previous signal energy and describas the current 
signal energy. 

The above-described noise domain detection method is preferably employed in the noise reducing method 
for speech signals according to the present invention. 

With the noise reducing method for speech signals according to the present invention, since the speech 
presence probability is calculated by spectral subtraction of subtracting the estimated noise spectrum from 
the spectrum of the input signal, and the maximum likelihood filter is adaptively controlled based upon the 
calculated speech presence probability, adjustment to an optimum suppression factor may be achieved de- 
pending on the SNR of the input speech signal, so that it is unnecessary for the user to effect adjustment prior 
to practical application. 

In addition, with the method for detecting the noise domain according to the present invention, since the 
value th employed for finding the threshold value Th t for noise domain discrimination is calculated using the 
RMS value of the current frame or the value th of the previous frame multipl ied by the coefficient a, whichever 
is smaller, and the coefficient a is changed over depending on the RMS value of the current frame, note do- 
main discrimination by an optimum threshold value responsive to the input signal may be achieved wfchout 
producing mistaken judgment even on the occasion of noise level fluctuations. 

The invention will be further described by way of non-limitative example, with reference to the acoon^a- 
nying drawings, in which:- 

Rg.1 is a block circuit diagram for illustrating a circuit arrangement for carrying out the noise reducing mtv 
od for speech signals according to an embodiment of the present invention. 

Fig.2 is a block circuit arrangement showing an illustrative example of a noise estimating circuit empfcyed 
in the embodiment shown in Ftg.1. 

Rg.3 is a graph showing illustrative examples of an energy E[k] and a decay energy EfeoyOc] in thcem- 
bodiment shown in Rg.1. 

Rg.4 is a graph showing illustrative examples of the short-term RMS value RMS[k], minimum noise IMS 
values MinNoisefk] and the maximum signal RMS values MaxsignaJ(k] in the embodiment shown in Fig,t 

Fig.5 is a graph showing illustrative examples of the relative energy in dB dB^fk], maximum SNR rjho 
MaxSNR[k] and dBthres^pc] as one of threshold values for noise discrimination. 

Rg.6 is a graph for illustrating NR level[k] as a function defined with respect to the maximum SNRi^be 
MaxSNRPc] in the embodiment shown in Fig.l 

Referring to the drawings, a preferred illustrative embodiment of the noise reducing method for ywmh 
signals according to the present invention is explained in detail. 

In Rg.1, a schematic arrangement of the noise reducing device for carrying out the noise reducing mjfcbd 
for speech signals according to the preferred embodiment of the present invention is shown in a block cfctit 
diagram. 

Referring to Fig. 1, an input signal yft] containing a speech component and a noise component is supjfej 
to an input terminal 11. The input signal yft], which is a digital signal having the sampling frequency of flfcfe 
fed to a framingrwindowing circuit 12 where it is divided into frames each having a length equal to FL saqjfcs 
so that the input signal is subsequently processed on the frame basis. The framing interval, wh ich is the anmnt 
of frame movement along the time axis, is Fl samples, such that the (k+1)th sample is started after FL sanjfcs 
as from the Kth frame. Prior to processing by a fast Fourier transform (FFT) circuit 13, the next downsfaan 
side circuit, the framing/ windowing circuit 1 2 preforms windowing of the frame-based signals by a windotfhg 
function W teput . Meanwhile, after inverse FFT or IFFTat the final stage of signal processing of the frame-bam 
signals, an output signal is processed by windowing by a windowing function W^. Examples of the wis**, 
ing functions and are given by the following equations (1) and (2) : 

W^ = (|-|-cos(^))« 

O^j^FL (1) 
^«^l = (|-fcos(^))« 

O^j^FL (2) 

If the sampling frequency FS is 8000 Hz = 8 kHz, and the framing interval Fl is 80 and 160 samples^ 
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framing interval is 10 msec and 20 m-sec, raspoctivaly. 

The FFT circuit 13 performs FFT at 256 points to produca frequency spectraJ arnplituda values which are 
divided by a frequency dividing circuit 14 into e.g., 18 bands. The following Table 1 shows examples of the 
frequency ranges of respective bands. 



TABLE 1 



Band Number 


Frequency Ranges 
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1563 - 1813 Hz 
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1813 - 2063 Hz 


11 


2063 - 2313 Hz 


12 


2313 - 2563 Hz 


13 


2563 - 2813 Hz 


14 


2813 - 3063 Hz 


15 


3063 - 3375 Hz 


16 


3375 - 3688 Hz 


17 


3688 - 4000 Hz 



These frequency bands are set on the basis of the fact that the perceptive resolution of the human audtory 
system is lowered towards the higher frequency side. As the amplitudes of the respective ranges, the matfnum 
FFT amplitudes in the respective frequency ranges are employed. 

A noise estimation circuit 15 distinguishes the noise in the input signal y[t] from the speech and defects 
a frame which is estimated to be the noise. The operation of estimating the noise domain or detecting thejMfse 
frame is performed by combining three kinds of detection operations. An illustrative example of noise doMBin 
estimation is hereinafter explained by referring to Fig .2. 

In this figure, the input signal yp] entering the input terminal 11 is fed to a root-mean-square value (RMS) 
calculating circuit 15A where short-term RMS values are calculated on the frame basis. An output of the Ms 
calculating circuit 1 5A is supplied to a relative energy calculating circuit 15B, a minimum RMS calculating <teuit 
15C, a maximum signal calculating circuit 15D and a noise spectrum estimating circuit 15E. The noise ^eo- 
trum estimating circuft 15E is fed with outputs of the relative energy calculating circuit 15B, minimum IMS 
calculating circuit 15C and the maximum signal calculating circuit 15D, while being fed with an output *the 
frequency dividing circuit 14. 

The RMS calculating circuit 1 5A calculates RMS values of the frame-based signals. The RMS value Rfe[k] 
of the k*th frame is calculated by the following equation: 
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RMS[k) 



(3) 



The relative energy calculating circuit 15B calculates the relative energy dB m l[k] of the k*th frama pertinent 
to the decay energy from a previous frame. The relative energy d8 rd [k) in dB is calculated by the following 
10 equation (4) : 

tfflUM =10log 10 (^^) (4) 

In the above equation (4), the energy value Efk] and the decay energy value Etfec^k] may be found re- 
spectively by the equations (5) and (6) : 

15 



20 



(5) 



E^AJ = max (EM, e ^ E^/MD (6) 

25 Sine the equation (5) may be represented by FL(RMS[k])2 an output RMS[k] of the RMS calculating cfccuit 
1 5A may be employed. However, the value of the equation (5), obtained in the course of calculation of the equa- 
tion (3) in the RMS calculating circuit 1 5A, may be directly transmitted to the relative energy calculating cfrcuit 
15B. In the equation (6), the decay time is set to 0.65 sec only by way of an example. 
Fig.3 shows illustrative examples of the energy E[k] and the decay energy E< Jac ^fk]. 

30 The minimum RMS calculating circuit 1 5C finds the minimum RMS value suitable for evaluating the tack- 
ground noise level. The frame-based minimum short-term RMS values on the frame-basis and the minimum 
long-term RMS values, that is the minimum RMS values over plural frames, are found. The long-term v*bes 
are used when the short-term values cannot track or follow significant changes in the noise level. Hie mutant 
short-term RMS noise value MinNoise thort is calculated by the following equation (7) : 

MfnNoise^kl = min (RMSfrl max (a (*) e^ MinNoise^k- 1 ] t MinN (7) 
a(k) = 1 RMS[k] < MAX_NCMSE_RMS, and 

RMS[k] < 3 MinNoise^Jk-1] 
0 otherwise 

40 The minimum short-term RMS noise value MinNoise,^ is set so as to be increased for the backgejtnd 

noise, that is the surrounding noise free of speech. While the rate of rise for the high noise level is expongm, 

a fixed rise rate is employed for the low noise level for producing a higher rise rate. 

The minimum long-term RMS noise value MinNoisetong is calculated for every 0.6 second. MinNoise^b 

the minimum over the previous 1.8 second of frame RMS values which have dB^ > 19 dB. ff in the pretfbus 
45 1.8 second, no RMS values have dB^ > 19 dB, then MinNoise^ is not used because the previous 1 sej*nd 

of signal may not contain any frames wtth only background noise. At each 0.6 second interval, if MinNotafc^ 

> MinNoiseshort, then MinNo^se*** at that instance is set to MinNoise^ 

The maximum signal calculating circuit 15D calculates the maximum RMS value or the maximum Hue 

ofSNR (S/N ratio). The maximum RMS value is used for calculating the optimum or maximum SNR valuator 
so the maximum RMS value, both the short-term and long-term values are calculated. The short-term maxiifem 

RMS value MaxSignal^hort is found from the following equation (8) : 

MaxSign^terfM = max (RMS&\, e - ^MaxSignal^k^]) (8) 
The maximum long-term RMS noise value MaxSignaf^ is calculated at an interval of e.g., 0.4 'nnmi 
This value MaxSigrtaJ^ is the maximum value of the frame RMS value during the term of 0.8 second tempooly 
55 forward of the current time point If, during each of the 0.4 second domains, MaxSignal^ is smaller than Ifa- 
Signalstert, MaxSignal^t is set to a value of (0.7 MaxSsjgnaJ^ + 0.3-MaxSignaltono). 

Ftg.4 shows illustrative values of the short-term RMS value RMS[k], minimum noise RMS value tfcv 
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Nois*i(V] md the maximum n-nl - J .'- '.ivai- '.!a(3ignal[kj. Inriy.-V, th^rninimumnoio jT<MO vaiu-j MinNois9[k! 
denotes the short- term v=iiu3 of M^Noise,.. v/nlch takfjs the long- 1 arm value .ViinNo'sa,^ into account. Also, 
the maximum signal RM3 value Ms <Sign;i j'<| .j.-snotas ths short-term value of Maxoignal nhort which takes the 
long-term value Maxsignal )oog into account. 

The maximum signal SNR value may j^timated by employing the short-term maximum signal RK\S val- 
ue MaxS»gnal sh0ft and the short-term minimum noise RMS value MinNoise^rt. The noise suppression charac- 
teristics and threshold value for noise domain discrimination are modified on the basts of this estimation for 
reducing the possibility of distorting the noise-free clean speech signal. The maximum SNR value MaxSNR 
is calculated by the equation: 

AteSMRM = 20.0.|o9 1o( ma < 1 ^°- 0 ' M ^ S/ ^ a ^) - 1.0) (9) 
1 J ^ 10V max(0.5,MnNo/se^/r]) ' K ' 

From the value MaxSNR. the normalized parameter NRJevel in a range of from 0 to 1 indicating the relative 

noise level is calculated. The following NTJevel function is employed. 

NR_level[k] = 

{■| 4 C ° 3 (7t ' ^ axSN ?l k] " 3 ° ) ) x (1-0 . 002 (MaxSNR [Jg] -30) 2 > 

30 < MaxSNR[k] < 50 

0.0 MaxSNR[k] > 50 

1.0 otherwise (10) 

The operation of the noise spectrum estimation circuit 15E is explained. The values calculated by the rel- 
ative energy calculating circuit 15B, minimum RMS calculating circuit 15C and by the maximum signal oscu- 
lating circuit 15D are used for distinguishing the speech from the background noise. If the following condttons 
are met, the signal in the kth frame is classified as being the background noise. 



<<RMS[k] < NoiseRMS^Ck]) 
or <rJB rel [k] > dBthres rel [k] ) ) and <RMS[k] < RMS[k-l] + 200) 

(11) 

where NoiseRMS rel [k] = min(1.05 + 0 . 45 • NR_level[k] ) 

MinNoise[k], MinNoise[k] + 

Max_A_N0I SE_RMS ) 
dBthres^Ck] = max(MaxSNR[k] - 4.0,0- 9 - MaxSNR [k] ) 



Fig.5 shows illustrative values of the relative energy dBnJk] maximum SNR value MaxSNR[k] and the ta» 
of dBthres^tk], as one of the threshold values of noise discrimination, in the above equation (11). 
Fig.6 shows NRJeveipc] as a function of MaxSNRfk] in the equation (10). 

If the k*th frame is classified as being the background noise or the noise, the time averaged estimated 
value of the noise spectrum Y[w, k] is updated by the signal spectrum Y[w, k] of the current frame, as shun 
in the following equation (12): 

N[w,k] = a max(N[w,k - 1], Y[ w ,k]) + (1 - a)-min(N[w,k - 1], Y[ w ,kD (12) 

a = e " asfs 
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where * denotes the band nom^r / ;r the i. *^quen.;y band split tin*;. 

If the k*th frame is classified -is :;-e sp-jDch, tna value of N[ </, !c-TJ is dir uiUly »js5;d for k]. 

An output of the r>oise estimation circuit 15 shown in Fig. 2 id transmitted to a speech estimation circuit 
16, a Pr(Sp) calculating circuit 17, a ?r(Sp I Y) calculating circuit 18 and to a maximum likaiihood filter 13. 

In carrying out arithmetic-logical operations in the noise spectrum estimation circuit 15E of the noiae es- 
timation circuit 15, the arithmetic-logical operations may be carried out using at least ona of output data of the 
relative energy calculating circuit 158, minimum RMS calculating circuit 15C and the maximum signal calcu- 
lating circuit 150. Although the data produced by the estimation circuit 15E is lowered in accuracy, a smaller 
circuit scale of the noise estimation circuit 1 5 suffices. Of course, high-accuracy output data of the estimation 
circuit 15E may be produced by employing all of the output data of the three calculating circuits 15B, 15C and 
15D. However, the arithmetic-logical operations by the estimation circuit 15E may be carried out using outputs 
of two of the calculating circuits 153, 15C and 150. 

The speech estimation circuit 16 calculates the SN ratio on the band basis. The speech estimation circuit 
1 6 is fed with the spectral amplitude data Y[w, k] from the frequency band splitting circuit 14 and the estimated 
noise spectral amplitude data from the noise estimation circuit 15. The estimated speech spectral data S[w, 
k] is derived based upon these data. A rough estimated values of the noise-free clean speech spectrum may 
be employed for calculating the probability Pr(Sp | Y) as later explained. This value is calculated by taking the 
difference of spectral values in accordance with the following equation (13). 

S[w,k\ = Vmax<p, Y[w,k? - p-N[w,W ) (13) 
Then, using the rough estimated value S'[w, k] of the speech spectrum as calculated by the above equation 
(13), an estimated value S[w, k] of the speech spectrum, time-averaged on the band basis, is calculated in 
accordance with the following equation (14) : 

S[w,k] = nwxfSTW.kJ, STw.k - 1]-decay_rate) 

^oayn^ = o ( 4-o2^f S (14) 
In the equation (14), the decay_jate shown therein is employed. 
The band-based SN ratio is calculated in accordance with the following equation (15) : 
SNR[w.Kl =20lc^ 10 ( ft °- 2 '^- 1 ^ +0-frSfr*l +0.2S[w + 1^ ) 

where the estimated value of the noise spectrum N[ ] and the estimated value of the speech spectrum may 
be found from the equations (12) and (14), respectively. 

The operation of the Pr(Sp) calculating circuit 17 is explained. The probability Pr(Sp) is the probabfty of 
the speech signals occurring in an assumed input signal. This probability was hitherto fixed perpetuaBy to 0\5. 
For a signal having a high SN ratio, the probability Pr(Sp) can be increased for prohibiting sound quality de^ 
terioration. Such probability Pr(Sp) may be calculated in accordance with the following equation (16) : 

Pr(Sp) = 0.5 + 0.45 (1.0 - NRJevel) (16) 
using the NRJevel function calculated by the maximum signal calculating circuit 15D. 

The operation of the Pr(Sp | Y) calculating circuit 1 8 is now explained. The value Pr(Sp I Y) is the probattty 
of the speech signal occurring in the input signal y[t] f and is calculated using Pr(Sp) and SNR[w, k]. The «tfue 
Pr(Sp | Y) is used for reducing the speech-free domain to a narrower value. For calculations, the method dfe- 
ctosed in R J. McAulay and M.L Mai pass, Speech Enhancement Using a Soft-Decision Noise Supprenfon 
Filter, IEEE Trans. Acoust, Speech, and Signal Processing, Vo. ASSP-28, No.2, April 1980, which is now ex- 
plained by referring to equations (17) to (20), was employed. 

*Wh)MH M 
HY\HO) =^p«- 1 (Rayleigh pdf) (18) 
f*Y\ H1) =^.e^lo (^) (Ridan pdf) (1 9) 

0 

(Modified Bessel function of 1st kind) ( 2 o) 

In the above equations (1 7) to (20), HO denotes a non-speech event, that is the event that the input sJpal 
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y(t) is the noise signal n(t), whil3 H1 denc.DS 11 spa-iSn evsnt, that is lr, i ev.ont that the in>jt -il^nal y(t) is a 
sum of the speech signal s(t) and tha nois3 signal n(t} and s(t) is not oqual to 0. In addition, k, Y, S and a 
denote the band number, frame number, input signal [w, k], estimated value of tha speech signal Gfv*, k] and 
a square value of the estimated noise signal N[w, k] 2 , respectively. 
5 Pr(H1 -Y) [w f kj is calculated from the equation (1 7) % while p(Y I HO) and p(Y | H1) in the equation ( 1 7) may 

be found from the equation (19). The Bessel function lo( I X | ) is calculatsd from the equation (20). 
The Bessel function may be approximated by the following function (21) : 



10 Vlxl) = 



15 



-~— -e |jc, +0 . 07 , if \x\ 2:0 . 5 

l otherwise (21) 



Heretofore, a fixed value of the SN ratio, such as SNR = 5, was employed for deriving Pr(H1 1 Y) without 
employing the estimated speech signal value S[w, k]. Consequently, p(Y | H1) was simplified as shown by the 
20 following equation (22) : 

p(Y\H1) =^e- snr *W2SNR-Z-) (22) 
a Vo 
A signal having an instantaneous SN ratio lower than the value SNR of the SN ratio employed in the cal- 
culation of p(Y | H1) is suppressed significantly. If it is assumed that the value SNR of the SN ratio is set to an 
25 excessively high value, the speech corrupted by a noise of a lower level is excessively lowered in its low-level 
speech portion, so that the produced speech becomes unnatural. Conversely, if the value SNR of the SN ratio 
is set to an excessively low value, the speech corrupted by the larger level noise is low in suppression and 
sounds noisy even at fts low-level portion. Thus the value of p(Y |H1) conforming to a wide range of the back- 
ground/speech level is obtained by using the variable value of the SN ratio SNR^w, k] as in the present env 
30 bodiment instead of by using the fixed value of the SN ratio. The value of SNRmJw, k] may be found from the 
following equation (23) : 

SNRn.Jw.kl = max (MINJSNR {SNR [w,A», (2 3) 
in which the value of MIN_SNR is found from the equation (24) : 



MIN_SNR(x) = 
3,X<10 

3-^^*1. 5, 10sx*45 

1.5, otherwise 



(24) 

The value SNR fW Jw l k] is an instantaneous SNR in the k'th frame in which limitation is placed on the nan i- 
mum value. The value of SNRmJw, k] may be decreased to 1.5 for a signal having the high SN ratio on the 
whole. In such case, suppression is not done on segments having low instantaneous SN ratio. The value 
SNRmJw, k] cannot be lowered to below 3 for a signal having a low instantaneous SN ratio as a whole. Con- 
sequently, sufficient suppression my be assured for segments having a low instantaneous S/N ratio. 

The operation of the maximum likelihood filter 19 is explained. The maximum likelihood filter 19 is on* of 
pre-f ilters provided for freeing the respective bands of the input signal of noise signals. In the most likeiliood 
filter 19, the spectral amplitude data Y[w, k] from the frequency band splitting filter 14 is converted into a riptal 
H[w, k] using the noise spectral amplitude data N[w, k] from the noise estimation circuit 15. The signal tfw, 
k] is calculated in accordance with he following equation (25) : 
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H[w t k] 

a^(l-a) .JZl^!>_L f Y>OandY±N 
5 a, otherwise 



(25) 

w where a = 0.7 - 0.4NRJevel[k]. 

Although the value a in the above equation (25) is conventionally set to 1/2, the degree of noise suppression 
may be varied depending on the maximum SNR because an approximate value of the SNR is known. 

The operation of a soft decision suppression circuit 21 is now explained. The soft decision suppression 
circuit 20 is one of pre-f titers for enhancing the speech portion of the signal. Conversion is done by the method 
is shown in the following equation (26) using the signal H[w, k] and the value Pr(H1 1 Y) from the Pr(Sp | Y) cal- 
culating circuit 18: 

H[w,k] <- Pr(H1 1 Y) [w,k]H[w,k] + (1 - Pr(H1 1 Y[w,k}MIN__GAIN (26) 
In the above equation (26), MIN_GAJN is a parameter indicating the minimum gain, and may be set to, for 
example, 0.1, that is -15 dB. 

20 The operation of a filter processing circuit 21 is now explained. The signal H[w, k] from the soft decision 
suppression circuit 20 is filtered along both the frequency axis and the time axis. The filtering along the fre- 
quency axis has the effect of shortening the effective impulse response length of the signal Hfw, k]. This elim- 
inates any circular convolution aliasing effects associated with filtering by multiplication in the frequency do- 
main. The filtering along the time axis has the effect of limiting the rate of change of the filter in suppressing 

25 noise bursts. 

The filtering along the frequency axis is now explained. Median filtering is done on the signals H[w, ig of 
each of 18 bands resulting from frequency band division. The method is explained by the following equations 
(27) and (28): 

Step 1 : H1[w, k] = rrwx(median(H&¥ - 1, k], H[w f k], H[w + 1, k], H[w, k] (27) 
30 where H1[w, k] = H[w, k] if (w-1) or (w+1) is absent 

Step 2 : H2fw, k] = min(median(H[w - 1, k], Hfw, k], H[w + 1 , k], H[w, k] (27) 
where H2[w, k] = H1[w, k] if (w-1) or (w+1) is absent 
In the step 1. H1|w, k] is H[w, k] without single band nulls. In the step 2 t H2[w, k] is H1[wjc] without eole 
band spikes. The signal resulting from filtering along the frequency axis is H2[w, k]. 
35 Next, the filtering along the time axis is explained. The filtering along time axis considers three state of 
the input speech signal, namely the speech, the background noise and the transient which is the rising portion 
of the speech. The speech signal ia smoothed along the time axis as shown by the following equation (SB). 
HWJw, k] = 0.7 H2[w, k] + 0.3-H2[w. k - 1] (29) 
The background noise signal is smoothed along the time axis as shown by the following equation (20) : 
40 rWw, k] = 0.7Min_H = 0.3 Max_H (30) 

where Min_H and Max_H are: 

Min_H = min(H2Iw, k] f H2[w, k - 1 J 
Max_H = max(H2[w, k], H2fw, k - 1] 
For transient signals, no smoothing on time axis is not performed. Ultimately, calculations are carried art for 
45 producing the smoothed output signal H t _ ftmoolh [w, k] by the following equation (31) : 

r^smoothK kJ - (1 - <%) (a sp -H 8p « >c Jw, k] + (1 - a^) rWw, k] + a^ rttfw, kj (31) 
o^p and otr in the equation (31) are respectively found from the equations (32) and (33): 

1. OfSMR^M.O 
iSXRinst- 1 ) . 1 • 0 <SNR lnBC <4 . 0 
0, otherwise 

55 

(32) 
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E? 0 333 132 



where 



lnst MxnNoise[k] 



2)-|,2.0<^ JB8 <3 
0, otherwise 



(33) 

where 

The operation in a band conversion ctrcuit 22 is explained. The 18 band signals H L>mooth [w l k] from the 
filtering circuit 21 is interpolated to e.g., 128 band signals H 128 [w, kl The interpolation is done in two stages, 
that », the interpolation from 18 to 64 bands is done by zero-order hold and the interpolation from 64 to 128 
bands is done by a low-pass filter interpolation. 

The operation in a spectrum correction circuit 23 is explained. The real part and the imaginary part of the 
FFT coefficients of the input signal obtained at the FFT ctrcuit 13 are multiplied with the above signal Hfafv, 
k] to carry out spectrum correction. The result is that the spectral amplitude is corrected, while the spectrum 
is not modified in phase. 

An IFFT circuit 24 executes inverse FFT on the signal obtained at the spectrum correction circuit 23. 

An overlap-and-add circuit 25 overlap and adds the frame boundary portions of the frame-based IFFT 
output signals. A noise-reduced output signal is obtained at an output terminal 26 by the procedure described 
above. 

The output signal thus obtained is transmitted to various encoders of a portable telephone set or to a signal 
processing circuit of a speech recognfflon device. Alternatively, decoder output signals of a portable telephone 
set may be processed with noise reduction according to the present invention. 

The present invention is not limited to the above embodiment For example, the above-described ff feting 
by the filtering ctrcuit 21 may be employed in the conventional noise suppression technique employing the max- 
imum likelihood filter. The nose domain detection method by the filter processing circuit 15 may be emptoyed 
in a variety of devices other than the noise suppression device. 



Claims 

1 . A method for reducing the noise in an input speech signal in which noise suppression is done by adapftdy 
controlling a maximum likelihood filter adapted for calculating speech components based on the ptx> 
ability of speech occurrence and the S/N ratio calculated based on the input speech signal, whereto the 
improvement comprises 

employing the spectrum of an input signal less an estimated noise spectrum in calculating the pot- 
ability of speech occurrence. 

2. The method as claimed in claim 1 , wherein the value of the above difference or a pre-set value, whichever 
is larger, is employed for calculating the probability of speech occurrence. 

3. The method as claimed in claim 1 , wherein the value of the above difference or a pre-set value, whichever 
is larger, is found for the current frame and for a previous frame, the value for the previous frame is Mul- 
tiplied with a pre-set decay coefficient, and the value for the current frame or the value for the pretfaus 
frame multiplied by a pre-set decay coefficient, whichever is larger, is employed for calculating the piub- 
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ability of speech occurrence. 

4. The method as claimed in claim 1, 2 or 3, wherein characterutics of th-» maximum likelihood filter are 
processed with smoothing filtering along the frequency axU and along th* ti/na axi3. 

5 

5. The method as claimed in claim 1, 2, 3 or 4, wherein noio3 domain is dstactad for finding the probability 
of speech occurrence by comparing the frame-based RMS values to a threshold value Th 1( a value th for 
finding the threshold value Tfy is found responsive to the RMS valua for tha current frame or the value 
th of the previous frame multiplied with a coefficient a, whichever is smaller, and the coefficient a is 

10 changed over depending on the RMS value for the current frame. 

6. The method as claimed in claim 5, wherein the value th for finding the threshold value Th, is found by 
employing a smaller one of the RMS value of the current frame and the value th of a previous frame mul- 
tiplied by a coefficient a, whichever is smaller, or the minimum value of the RMS values over plural frames, 

15 whichever is larger. 

7. The method as claimed in claim 6, wherein the noise domain detection is done by discriminating the rel- 
ative energy of the current frame using a threshold value Th2 calculated using the maximum SN ratio of 
the input speech signal. 

20 

8. A method for reducing the noise in an input speech signal in which noise suppression is done by adapavety 
controlling a maximum likelihood filter adapted for calculating speech components based on the prob- 
ability of speech occurrence and the S/N ratio calculated based on the input speech signal, wherein the 
improvement comprises 

25 smoothing filtering the characteristics of the maximum likelihood fitter along the frequency axis 

and along the time axis. 

9. The method as claimed in claim 8, wherein a median value of characteristics of the maximum likeflhood 
filter in the frequency range under consideration and characteristics of the maximum likelihood fUer in 

so neighbouring left and right frequency ranges is used for smoothing filtering along the frequency axfo. 

1 0. The method as claimed in claim 8 or 9, whereat the smoothing filtering along the frequency axis comprises 
the steps of 

selecting the median value or the characteristics of the maximum likelihood filter in the frequency 
35 range under consideration, whichever is larger, 

the median value for the frequency range under consideration corresponding to the processing re- 
sults or the characteristics of the maximum likelihood filter in the frequency range under consideration, 
whichever is smaller. 

40 11. The method as claimed in claim 9 or 10, wherein the smoothing filtering along the time axis incfcides 
smoothing for signals of the speech part and smoothing for signals of the noise part 

12. A method for detecting a noise domain by dividing an input speech signal on the frame basis, f indng an 
RMS value on the frame basis and comparing the RMS values to a threshold value Th, for detecting the 
45 noise domain, wherein the improvement comprises 

calculating a value th for finding the threshold Th1 using the RMS value for the current frame and 
a value th of the previous frame multiplied by a coefficient a, whichever is smaller, and changing over 
the coefficient a depending on an RMS value of the current frame. 

so 13. The method as claimed in claim 12, comprising calculating a value th for finding the threshold Th, uring 
a smaller one of the RMS value for the current frame and a value th of the previous frame multiplfed by 
a coefficient a, or the smallest RMS value over plural frames, whichever is larger. 

14. The method as claimed in claim 13, wherein the noise domain is detected based upon the results of 6s- 
55 crimination of the relative energy of the current frame using the threshold value Th2 calculated using the 

maximum SN ratio of the input speech signal and the results of comparison of the RMS value to the thajgh- 
old value Th 1( 
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